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Recipes for stable linear embeddings from Hilbert 

spaces to R'” 

Gilles Puy, Mike E. Davies, and Remi Gribonval 


Abstract —We consider the problem of constructing a linear 
map from a Hilbert space 1-L (possibly infinite dimensional) to 
that satisfies a restricted isometry property (RIP) on an arbitrary 
signal model, i.e., a subset of %, We present a generic framework 
that handles a large class of low-dimensional subsets but also 
unstructured and structured linear maps. We provide a simple 
recipe to prove that a random linear map satisfies a general RIP 
with high probability. We also describe a generic technique to 
construct linear maps that satisfy the RIP. Finally, we detail how 
to use our results in several examples, which allow us to recover 
and extend many known compressive sampling results. 

Index Terms —Compressed sensing, restricted isometry prop¬ 
erty, box-counting dimension. 

1. Introduction 

T he restricted isometry property (RIP) is at the core of 
many theoretical developments in compressive sensing 
(CS). It allows ones to show that sparse signals can be 
“captured” by few linear and non-adaptive measurements and 
recovered by non-linear decoders Q. In a finite dimensional 
space, a matrix A G satisfies the RIP on a general set 

5 C if there exists a constant d G (0,1), such that for all 

X e S, 

(l-(5)M^<||Ax||^^(l + (5)||x||^ (1) 

Random matrices with independent entries drawn from the 
centered Gaussian distribution with variance m~^ are exam¬ 
ples of matrices that satisfy the RIP with high probability for 
many low-dimensional sets such as 5 = E 2 /C, associated to 
/c-sparse signals 5 = E 2 r, associated to rank-r matrices 
|[^, or S = {xi — X 2 \xi^X 2 G E}, associated to a compact 
Riemannian manifold E Q. In these scenarios, the RIP holds 
for a number of measurements m essentially proportional to 
a measure of intrinsic dimension of S. 

In this work, we extend the construction of linear maps that 
satisfy the RIP in a finite-dimensional ambient space to linear 
maps that satisfy the RIP on subsets of a possibly infinite- 
dimensional space. This extension of the CS theory to infinite¬ 
dimensional spaces is important to properly apply CS in an 


analog setting 0, explore connections with the sampling of 
signals with finite rate of innovation or also in machine 
learning to develop efficient methods to compute information¬ 
preserving sketches of probability distributions Q, ||^. As 
an example of the application of our results, we will explain 
how to build a stable linear embedding of sparse signals in the 
Haar wavelet basis of I/2([0,1]) with a sampling in the Fourier 
basis in Section |V-A[ Note that this type of signal model and 
sampling is often used to study the theoretical aspects of CS- 
MRI acquisition. 


A. The normalised secant set 

The RIP is a very convenient tool when one wants to 
prove that a signal x e H can be reconstructed from its 
compressed measurements Ax when it belongs to a given 
model set, x G E c H. If A satisfies the RIP on the 
set of 2/c-sparse vectors then every /c-sparse vector x can be 
accurately and stably recovered from its noisy measurements 
Ax -|- n by solving the Basis Pursuit problem j^. For more 
general low-dimensional signal models E, one can prove that 
a signal x G E can be stably recovered from its compressed 
measurements if the matrix A satisfies the RIP on the secant set 
iS = F — F, ||^, I p^ , i.e., if there exists a constant 6 G (0,1) 
such that 

(1 - ( 5 ) ||a;i - a;2||2 < ||A(a;i - a;2)||2 < (1 + < 5 ) ||a;i - a;2||2 • 


for all xi,X 2 G F. When this condition is satisfied, we say 
that the matrix A stably embeds the set F in W^. 

Fet us recall that the secant set of F is defined as 


F-F := 


= xi- X 2 


Xi,X2 


G 


F 


}■ 


One can remark that the above condition for stable recovery 
is equivalent to 


sup 

zes{T.) 


WWl-i 


^5, 


( 2 ) 


where 
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>S(S) := [z 


2/e(I]-S)\{0}} 


is the normalised secant set of F. This indicates that the set 
of interest to prove that a matrix A stably embeds the set F 
is not directly F, but rather its normalised secant set tS(F). 

From now on, we concentrate on proving a generalised 
version of 0 for an arbitrary set S lying on the unit sphere 
in the ambient space. If one wants to prove that a matrix or, 
more generally, a linear map stably embeds a set F, one just 
needs to substitute 5(F) for 5 in the following results. 
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In this paper, we consider that the ambient space is an 
infinite-dimensional Hilbert space denoted by 1-L. The inner 
product in H is denoted by (•, •) and the associated norm by 
II*11. Vectors in H are given a bold letter while vectors in 
are given a regular letter. We denote by S the unit sphere in 
H. We assume everywhere S C S: ||cc|| = 1 for all cc G 5. 

B. Measuring the dimension of S 

All the developments in this work are based on the assump¬ 
tion that S has a small intrinsic dimension. In the literature, 
several definitions of dimension exist. The reader can refer 
to, e.g., the monograph of Robinson |TT| , for an exhaustive 
list of definitions. In an infinite-dimensional space, one should 
be careful with the definition of dimension used. Indeed, as 
described in GD there are examples of sets for which no 
stable linear embedding to a finite dimensional space exists 
even though their dimension is finite (according to some 
definition). Therefore, there is no hope to construct a linear 
map that satisfies the RIP for these sets. 

In this paper, we use the upper box-counting dimension as 
our measure of dimension. The upper box-counting dimension 
is also at the centre of most of the developments in GD- 

Definition I.l (Covering number). Let e > 0. The covering 
number Ns{e) of S is the minimum number of closed balls of 
radius e, with respect to the norm H-H, with centres in S needed 
to cover S. The set of centres of these balls is a minimal e-net 
for S. 

Definition 1.2 (Upper box-counting dimension). The upper 
box-counting dimension of S is 

dim(5) :=limsup log[V^ 5 (e)]/log[l/e]. 

e^O 

From the above definition, one can remark that if d > 
dim(iS) then there exists > 0 such that Ns{e) ^ e~^ for 
all e ^ e^. In this paper, we make the following assumption 
on S. 

Assumption A. The set S C § has a finite upper box-counting 
dimension dim(5) which is strictly bounded by s ^ 1.* 
dim(5) < s. 

Therefore, there exists a model-set dependent constant 
es G (0,1/2) such that Ns{e) ^ e~^ for all e ^ e^. 

In many well-behaved cases, the upper-box counting dimen¬ 
sion of the normalised secant set 5 = 5(11) of H is simply 
twice the upper-box counting dimension of S (or H D S if H 
is not bounded). For example, the set of normalised /c-sparse 
vectors in has dimension k and its secant set has dimension 
2k. The same results holds for low-rank matrices and low¬ 
dimensional manifolds. However, this is not always the case 
as we will see in Section IV-B I 


separate the steps that are problem-specific from the ones 
that are common to most problems and do not need to be 
repeated every time. This allows us to provide a simple 
recipe for the proof of the RIP, with techniques similar 
to those of, e.g., 0 GD-GD This is the subject of 
Section JIl 

• Second, given an arbitrary set S that satisfies Assumption 
[A| we propose a generic construction of linear maps that 
satisfy the RIP on S. This construction is made of two 
steps. In the first step, we show that one can always find a 
finite-dimensional subspace of large but finite dimension 
that accurately approximates S. We then use this subspace 
to build a linear functional that maps the vectors in S into 
finite but potentially large dimension while preserving 
their norm to a prescribed accuracy. The second step 
consists in reducing the embedding dimension further 
with a random matrix. This construction is detailed in 
Section Jill 

• Third, we show how to use the developed techniques to 
recover and extend many known CS results. In particular, 
we show that our recipe is general enough to handle struc¬ 
tured measurement strategies such as the ones proposed 
in GD^ GZ)- These examples are presented in Section 
lYl 


• Fourth, we show that while it is sufficient to have a set 
S of finite upper-box counting dimension to guarantee 
the existence of linear maps that have the RIP on S, 
this condition is not necessary. This fact is discussed in 
Section ED 

We discuss related works in Section |V| and conclude in Sec¬ 
tion [Vn All technical proofs can be found in the appendices. 


II. Recipe to prove that the RIP holds 

In this section, we give a generic recipe to prove that a 
random linear map L : TL ^ preserves the norm of all 
vectors in the set 5 C S. 

We suppose that L is built by drawing at random m vectors 
(/i,..., Im) in TL^ using a probability measure p on TL^: 

L'.n — 

^ ^ (3) 

Notice that L is a continuous linear map. 

In the following, we start by introducing a generalised form 
of RIP. Then we detail generic properties of the probability 
measure p which are sufficient to prove that L preserves the 
norm of all vectors in S with high probability, provided that 
m is sufficiently large. Finally, we give a recipe to show that 
a random linear map preserves the norm of all vectors in S 
with high probability. 


C. Contributions and organisation of the paper 
This work has four main contributions. 

• First, given a set S that satisfies Assumption and a 
random linear map L \TL ^ we factorise the proof 
of the RIP for L on 5 into a small number of steps. We 


A. Generalised form of the RIP 

Let us come back to the usual form of the RIP ([T]), which 
involves the Euclidian norms in the ambient spaces. First, we 
notice that it can be rewritten in the following equivalent form: 


IIAa^ll^ 



( 4 ) 
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Second, we remark that for typical random matrices A sat¬ 
isfying this RIP, one has E||Ax ||2 = ||^c|| 2 - This property 
is satisfied, e.g., for random matrices A G whose 

entries are independent Gaussian variables with zero-mean 
and variance 1/m, for ±.l/^/m Bernoulli random matrices, 
or for (rescaled) sensing matrices constructed by selecting 
independently m vectors from an orthonormal basis using 
the uniform distribution. For all these cases, inequality 0 
becomes 


||Aa;||2-E||Aa;||2 




(5) 


In this form, we see that the right-hand side of characterises 
how much || Ax ||2 deviates from its mean. In ph this deviation 
is proportional to ||cc|| 2 , the square of the Euclidean norm in 
the ambient space. We replace here the Euclidean norm by the 
norm H-H in the considered Hilbert space. Finally, instead of 
concentrating only on the ^ 2 -norm in the measurement space, 
we will consider arbitrary -norms, p ^ 1. 

We define the following semi-norm for all cc G H, 




Gathering all the above remarks and adapting them to the 
case of random linear maps from V, to we consider the 
following general form for the desired RIP: 


\\L{x)\\l 


X 


r 

\fi,p 




Vx G iS c §. 


Definition II. 1 (RIP). Let L : LL ^ be a linear map. 
Define 


sup ||i(a;)||; - II® 
xes 


r 


( 6 ) 


and 




\x\\ 


p 

p,p • 


The linear map L satisfies the RIP on S cE> with constant 
^ if ^s,ij,,p ^ s <i p p- 


It will also be convenient to define 


:= sup ||®||^^p. 

xES 

To simplify notations, we substitute 6p, 6^, and 6p for Ss,p,p, 
^s,p,p^ ^s,p,p^ respectively, but one should keep in mind 

that these quantities depend on S, p and p. 

We remark that if L satisfies the RIP on S with constant S 
then 


for all cc G 5. The condition 6 < ensures that no vector in 
X G 5 is in the null space of L. Indeed, we have 

||i:(®)||^^4-5>o, 

for all X e S. 

In the remaining part of this section, we give generic 
sufficient conditions to ensure that the RIP holds with high 
probability. We will see how to recover classical RIP results 
in finite dimensions with p = 2 in Section |IV-A| 


B. Concentration inequalities and main result 

If Assumption [A| holds, the only other ingredients needed to 
prove that a random linear map L of the form of 0 satisfies 
the RIP with high probability are concentration inequalities. 
The choice of the probability distribution p is thus important 
to ensure a preservation of the norm of the vectors in S 
by L. In this section, we assume that the following general 
concentration inequalities hold. 

Assumption B. Define the function 

hL,fi,p • ^ ^ 

x^\\L{x)\\l-\\x\\l^. 

There exist two constants ci,C 2 G (0, 00 ] such that for any 
fixed y,z e Sc {0}, 

^{\hL,n,p{y) - hL,p.,p{z)\ ^ A lly - z\\} < (7) 

for every 0 ^ A ^ C 2 /C 1 , and 

^{\hL,p,p{y) - hL,p.,p{^)\ > A ||y - z\\} ^ (8) 

for every A ^ C 2 lc\. 

To simplify notations, we substitute hp for hL^p^p in the 
remaining part of the paper. 

We remark that we can have ci = oo or C2 = oc. This is 
to handle the case where one of the above bounds hold for all 
A ^ 0. If Cl = oo then ^ holds for all A ^ 0. Similarly, if 
C 2 = oc then 0 holds for all A ^ 0. 

We can now state our main theorem from which all the 
following results are derived. 

Theorem II.2. Let L \ TL ^ be a random linear 
map constructed as in 0 using a probability distribution 
p. If Assumption ^ and Assumption hold, then for any 

^ G (0, 1), we have: Sp ^ S with probability at least 1 — ^ 
provided that 

where C > ^ is an absolute constant. 

Inequality (|^ above involves the natural logarithm. The 
proof of this theorem is available in Appendix The proof is 
based on a chaining argument which is a powerful technique to 
obtain sharp bounds for the supremum of random processes, 
see, e.g., 0 GD-GD This technique can be viewed as a 
refinement of the classical e-net argument such as used in, 
e.g., (g, (g. 


C. Proof recipe for the RIP 

We can now deduce the following recipe to show that a 
random linear map L of the form of Q satisfies the RIP on S. 

The result of this recipe is that the random linear map L 
satisfies the RIP on the set S with constant d G (0,^^) and 
probability at least 1 — ^ provided that 0 holds. 

In many practical scenarios, such as the ones discussed 
in next sections, the concentration inequalities 0 and 0 
in Step 3 are easy to obtain using well-known concentration 
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Recipe 1 Recipe to prove that L satisfies the RIP on S 
I: Prove that the set S has finite upper box-counting dimen¬ 
sion (Assumption [A|. 

2: Compute := inf^-g^E^ \\L{x)\\^. 

3: Prove that the concentration inequalities 0 and ^ hold 
(Assumption |^. 

4: Choose m such that ([^ holds. 


inequalities for the sum of independent random variables, see, 
e.g., 0, | [T8| . We will also see that the estimation of 6p 
simplmes m these cases. Therefore, the main difficult step 
in the recipe is most often the computation of the upper box¬ 
counting dimension of S. 

III. A GENERIC CONSTRUCTION OE L WITH THE RIP 

In the previous section, we provided a recipe to prove that 
a random linear map satisfies the RIP once the probability 
distribution /i is given. In this section, we give a generic way 
of constructing a random linear map which has the RIP on 
a set S that satisfies Assumption [A| i.e., we give a generic 
construction of fi. 

We divide our construction into two steps. In the first step, 
we build a continuous linear functional b that maps the vectors 
in S to finite but potentially large dimension while essentially 
preserving their norm. In the second step, we further reduce 
the embedding dimension by multiplication with a random 
matrix. 

A. Mapping to a finite-dimensional subspace 

In this section, we prove that it is always possible to design 
a continuous linear map h\ H where d is potentially 

large but finite, such that 


IIK®)llb ^ ll®ll , Va; e "H, 

(10) 

ll^(^c)|lb > 1 - e*, Va: e 5, 

(11) 


for some G (0,1) and a well-chosen norm H-H^. One can 
remark that the above properties ensure that b already satisfies 
a RIP but for a potentially large dimension d. We will see how 
to further reduce the dimension in Section IIII-BI 

In the following subsections, we detail a generic method to 
construct b. This construction uses the fact that we can cover 
S with precision e* as Assumption holds. Note that this 
construction can be computationally expensive in practice. Yet, 
we would like to highlight that one does not have to use such a 
covering in practice. Indeed, the only required property in the 
following results is that b satisfies ( p^ and ( [TT] ). The way b is 
constructed is not important as long as these properties hold. 
In practice, one can use other properties of S to construct b 
efficiently. For example, we will see in Section [V-A| that, for 
signals sparse in the Haar wavelet basis of I/2([0,1]), such 
a linear map can be built easily using properties of these 
wavelets in the Fourier basis. 



Eig. I. Construction of Ve,,. Top left: cover of *S with A/'(e*) balls of radius 
e*. Top right: the centres of the balls, indicated by the red crosses, form a 
e*-net, denoted by C(e*), for S. Bottom left: the linear span of the vectors 
in C(e*) is Ve^. Bottom right: Ve^ approximates S with precision e*. 

1) Using an orthonormal basis of a well-chosen subspace: 
If Assumption holds, then we can cover S with a finite 
number of closed balls. We fix a resolution find a minimum 
e>^-net for S, and denote the set of points in this net by C{ef). 
The cardinality of C{e^) is Ns{e^). Let C be the 
finite-dimensional linear subspace of 1-L spanned by C{e^), 
and : P ^ P be the orthogonal projection onto 

By construction of , we have 

sup ||cc - Py^^ (cc) II ^ sup min ||x - Xo|| ^ e*. (12) 

xes * 

The orthogonal projection onto thus preserves the norm 
of all vectors in S with an error at most as illustrated in 

Fig.[T] 

Let (6i,..., 6c^) be an orthonormal basis for and b be 
the linear map 

b-.n —> 

X I—s- {{bi,x))i^i^d- 

We remark that 

\Hx )\\2 = ||Fv.. (®)|| < ll®ll, v® e "H, 

and that 

\\b{x)\\^ ^ \\x\\ - \\x - Py^^ (x)|| ^ 1 - e*, Vcc G S. 

To obtain the last inequality, we used inequality ( p^ and the 
fact that 5 C §. We thus proved the desired result with H-H^ = 

IMl2- 

2) Using an arbitrary basis of Ve^: The linear map b in 
the last section was constructed using an orthonormal basis 
of . To provide more fiexibility in the design of the linear 
map P, we propose here to use an arbitrary basis of . Let 
{bi, ... ,bd) be such a basis and define 

6: P — 

X I—^ {{bi,x))i^i^d- 

We also define the following norm 

\\y\\b= in|,{ll^ll I Kz) = y}, 

ZEi rt 

for all ^ G We next show that and ( pT| ) hold with 
this choice of norm. 

We denote hy y the vector with coordinates y in the 

basis (6i,..., 6c^). Let be the orthogonal complement to 
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Ve^ in 'H. It is easy to notice that the set of vectors that satisfy 
b{z) = y is 

{z = y + v^ \y^ e V^,}, 

i.e., the set of vectors whose orthogonal projection on is 
y. Then, by orthogonality, we have || 2 ;|| = ||2/|| + ||2/^|| ’ 

whose minimum value is \\y\f. Therefore, ||^||^ = ||2/|| and 
we deduce that ||6(x)||^ = ||Pv^^(x)|| for all x e H. As we 
have ||Pv^ (cc)|| ^ ||cc|| for all cc G P, |To| holds. Then the 
fact that ||Pve^ ll^ll “ ||^ “ W^|| ^ 1 — e* for all 

X e S shows that (TT\ holds. 

We remark that when (6i,..., 6^^) is an orthonormal basis, 
we have ||^||^ = \\y\\ = \\y\\ 2 , i-e., the norm we naturally 
chose in the previous section. The choice of H-H^ as norm in 
allows us to recover the result of the last section but also 
to generalise it. Indeed, the result is now independent of the 
choice of the basis (6i,..., 6^^). As explained in the remark 
before Recipe 2 below, this result can be useful when the 
choice of (6i,..., hd) is for example fixed by the application. 


B. Dimensionality reduction with a random matrix 

The linear map b in the previous section maps the vectors in 
S into finite, but potentially large, dimension with an error at 
most e* on the norm of the vectors. The goal is now to reduce 
the dimension further without degrading much the norm of the 
vectors in b{S). 

Before continuing, we introduce the following definitions 
(definitions 5.7 and 5.13 in (T^). 

Definition III.l (Subexponential random variable). A subex¬ 
ponential random variable X is a random variable that 
satisfies 

(E|Xni/« < Cqfor all q^l, 


To characterise the number of measurements needed to 
preserve the norm of all vectors in 5, we need the following 
quantities 

Ai := sup |||aj6(a;)||^ /||6(a:)||f,| and (13) 

x^O 

A 2 := sup U\ajb{x)\\^ /\\bix)\\f\. (14) 

xe{s-s}us ^ ^ 

x^O 

Note that these quantities may be infinite. However, we can 
ensure that Ai and A 2 are finite, for example, by drawing 
ai,..., in the ball of radius 1 with respect to the dual 
norm of H-H^ defined as 

llallfc. := sup {|aTa:;| | x G ||x||j < l} . 

Indeed, we have \a'^b{x)\ ^ ||5(x)||^ for any cc G P in this 
case, which ensures that A 2 ^ 1 and Ai ^ 1. 

We are now ready to give the two main results of this 
section, which follow from Theorem |II.2| 

Theorem III.3. Assume that Ai < +(X) and define 


L: P 


X 


ijb{x) 


m 


l<i<d 


(15) 


There exists an absolute constant C > 0 such that if Assump¬ 
tion ^ holds then, for any ^ G (0,1) and 5 G (0,1), with 
probability at least 1 — 


< ||A(a:)||i ^ ||a:||^^i + <5, 
X G S provided that 


C 


ma9 A ? 




with C > 0. The subexponential norm of X, denoted by 
||X||^^, is the smallest constant C for which the last property 
holds, i.e., 

||X||^^ :=sup{y-i (E|X|«)i/«}. 

Definition III.2 (Subgaussian random variable). A subgaus- 
sian random variable X is a random variable that satisfies 

(E < Cy^for all q ^ 1, 

with C > 0. The subgaussian norm of X, denoted by ||X||^^, 
is the smallest constant C for which the last property holds, 
i.e., 

:=sup{g-i/2 (ElXn^/'^}. 

To reduce the dimension, we draw m independent random 
vectors ai,..., G according to a probability distribu¬ 
tion u in The measurements ajb{x), i = are 

thus independent and identically distributed random variables. 
The choice of b and u defines the probability distribution y in 

n^. 


Theorem III.4. Assume that A 2 < +00 and define 





l^i^d 


(17) 


There exists an absolute constant C > 0 such that if Assump¬ 
tion ^ holds, then, for any ^ G (0,1) and 5 G (0,1), with 
probability at least 1 — 


||x||^,,-(5^||P(x)||^^ 


\x\ 


M,2 


holds for all X ^ S provided that 


^ C 

m ^ max 
0 ^ 


(8A|,A|)inax|slog , log |. 

(18) 


Both theorems are proved in Appendix Let us comment 
on these results and highlight the situations where they are 
useful. We will study several examples in the next section. 

• First, we remark that 


4 


inf Ej, |aj6(a3)|^ < sup E,^ |aj6(a3)|^ = (5p 
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with = 1 in Theorem III.3 and p = 2 in Theorem 
Bv definition of 


ITTT A 


E^|aT 6 (a;)|^ ^ pA^ \\b{x)\\l ^ pA^, \/x e S, 

for p = 1 and 2. The last inequality follows from ( fT0| ). 
Therefore, 


L < Sp ^ pAl 


with p = 1 in Theorem |III.3| and p = 2 in Theorem |III.4 
Second, we notice that the RIP is satisfied for 6 < 6 ^ 
in the case of Theorem |IIL3| and for S < S 2 in the case 
of Theorem III.4| Consequently, if L is constructed as in 
this section, then Recipe can be modified to Recipe 


Recipe 2 Recipe to prove that L in ( p3) ) or ([TtJ satisfies the 
RIP on S _ 

I: Prove that the set S has finite upper box-counting dimen¬ 
sion (Assumption [A|. 

2: Compute 6^ = infaje^E^y \ajb{x)\^, with p = 1 for L 
defined in |T5] ) or p = 2 for L defined in 

3: Compute Ai or A 2 . 

4: Choose m such that holds if p = 1 , or such that ( p^ 
holds if p = 2 . 


• Third, the number of measurements essentially 
scales with s max( 2 Af, Ai)/^^ in the first case 
and s max( 8 A 2 , A 2 )/(^^ in the second case. As we 
should have s < Sp (with p = 1 or p = 2 ) to satisfy 
the RIP, the results have a practical interest when 
max(2A^, Ai)/^^ or max( 8 A 2 , A^)/^^ small. These 
ratios are the quantities to optimise when designing L, 
as illustrated in Section IIV-BI 

• Finally, it is important to notice that the choice of the 
basis ( 61 ,..., 6 ^^) and the choice of the distribution u 
for the a^’s interact together in the value of the ratio 
max(2A^, Ai)/^^ or max( 8 A 2 , A^)/^^' If 
fiexibility to choose both ( 61 ,..., 6 ^^) and u, then one 
should seek to minimise these ratios in order to minimise 
m. Even if the choice of b is fixed by the application 
of interest, one still has the fiexibility to optimise the 
distribution u to minimise m. 


In Recipe introduced above, when ( 61 ,..., 6 ^^) is an 
orthonormal basis and is an isotropic random vector, then 62 
can be directly estimated using (1^) as done below in Section 
IV-Al for the second step of the recipe. We recall that G 


is isotropic if E |ajx| = ||x ||2 for all x G To estimate 
one can use Lemma ID.2I as also done below in Section IIV-B2I 
and Section |IV-B4| for the second step of the recipe. 


IV. Examples 

In this section, we show how to use our generic recipe on 
different examples. 


A. A linear embedding from l-L to 

1) The infinite-dimensional case: In Section III-A we built 
a linear map b \ H ^ which preserves the norm of 


all vectors in S. Eor simplicity, we consider the case where 
( 61 ,..., hd) is an orthonormal basis. We have 

(1 - e* *)" ||®f < \\Kx)\\l = \\h{x)\\l ^ ||®f , V® e 5. 

(19) 


Let ai 


, . . . , Ujjyi 


be m vectors whose entries are inde¬ 


pendent Gaussian random variables with mean 0 and variance 
1, and define L as in Theorem III.4 We follow Recipe to 
prove that L has the RIP. 

1: The set S has a finite upper-box counting dimension by 
assumption. 

2: To estimate 62 , we first notice that 

V.\alh{xf = \\h{x)\\l. 


Inequality ( p^ then yields 

{l-e^)^\\xf^E\aJb{x)f^\\x\\ 


\/x G S. 


Therefore, ^2 ^ (I ~ ^ 2 ^ 1 - 

3: To estimate A 2 , we use the fact that ||ajx||^^ ^ D \\x\\ 2 , 
for all X G where D > 0 is an absolute constant (see 
(T^). Therefore, 


and A 2 ^ D. 

4: The last step of the recipe proves that L satisfies the RIP 
with constant S G (0, (1 — and probability at least 
1 - i.e., 

{1 - - 6 ^ \\L{x)\\l ^ 1 + 6, \/xeS, 


provided that 


C 

m ^ ^ max s log 


G)' 


log ( 7 


)} 


where C is an absolute constant. 


With this example, one can remark that the only difference 
in the construction of the linear map L between an infinite¬ 
dimensional setting and a finite-dimensional setting is the 
presence of the intermediate mapping b. This mapping is 
built from the projection onto a well-chosen subspace that 
preserves the norm of all vectors in S. Einally, the number 
of measurements m is essentially proportional to the intrinsic 
dimension of 5, as usual in CS results. 

2) Recovery of known results in finite dimension: The 
above results holds for any set S that satisfies Assumption 
in a Hilbert space H, so it also holds for signal models in 
provided that they have a finite upper box-counting dimension. 
Let us take one example in finite ambient space. 

Consider the set S of 2 /c-sparse signals with unit -^ 2 -norm in 
This set can be covered by at most [3en/(2A:e)]^^ balls of 
radius e G (0,1) ||^. Its upper box-counting dimension is thus 
2k. Let us compute es and s which appear in Assumption [A| 
We remind that is a constant such that [3en/{2ke)]‘^^ ^ 
for all e ^ es with s > 2k. Writing s = 2k ^ r] with r] > 
0, we should have {Sen/{2k))‘^^ ^ for all e ^ es, or, 
equivalently, {Sen/{2k)Y^ ^ We take es = 2k/{Sen) 
and 77 = 2k. Notice that ^ 1/2 and that s = Ak. As 
we are in finite ambient dimension, we can take 
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M 


(which implies that e* = 0 and d = n) and the canonical 
basis for (6i,..., 6^) (which implies that b is the identity). 
The complete linear map L thus reduces to the matrix 
which satisfies, with probability at least 1 — 

(1_5)||^||2^^^(1 + 5)||^||2, (20) 

for all 2/c-sparse vectors x provided that m satisfies 

^max|4fclog(^^^ , log (^0 | . 

In particular, this result shows that the set of /c-sparse vectors 
is stably embedded into for m satisfying the above 
inequality. 

In comparison. Theorem 9.2 in |[^ shows that 
satisfies ( [20| with probability at least 1 — ^ provided that 

(|2Hog(|)+lc.gQ)), 

where > 0 is an absolute constant. This condition on m is 
similar to ours. This shows that we recover results similar to 
known ones in CS. 

Note that we could have chosen any orthonormal basis of 
for (6i,..., 6^). This illustrates the known universality 
of subgaussian measurement matrices relative to the sparsity 
basis 

Following the same procedure, one can easily recover many 
other similar results for, e.g., the set of low-rank matrices 
low-dimensional smooth manifolds or the set of group- 
sparse signals p0| . In all these cases, the only remaining 
difficulty is estimating the upper box-counting dimension of 
the set of interest. 


B. Embedding of matrices with rank-one projections 

As a second example, we propose to use our results to show 
that one can embed a low dimensional set of matrices using 
rank-one projections. This scheme was proposed and studied 
in, e.g., |T^ , |T7| , | |^ , p2| to embed certain low-dimensional 
set of matrices: low-rank matrices, sparse matrices, low- 
rank-Fsparse matrices, etc. We extend here these results to any 
set with a low-dimensional normalised secant set that satisfies 
Assumption [A| which, e.g., includes matrix manifolds. 

In this section, the ambient space is the space of real 
matrices of size ni x 772 , equipped with the Frobenius 

norm, IHIprob- consider that S C ^ 

dimensional subset of matrices that satisfies Assumption 
with instead of IHI. 

1) Measurement strategy: As done in |T6), (T7),(D, (22), 
we propose to measure a matrix M G using m rank- 

one projections: 


L: 


M 


ajM6,; 


( 21 ) 




where ai,..., G and 61 ,..., 6 ^ G are indepen¬ 
dent random vectors. Note that the main advantage of this 
measurement strategy is its memory efficiency. Indeed, we 


only need to store m{ni + 772 ) coefficients to compute the 
measurements while the measurement strategy proposed in 
requires to store 777771772 coefficients. 

We will pay attention to two types of random vectors. First, 
we will focus on the case where the entries of the vectors 
and bi, i = 1,...,777, are independent draws of a random 
variable N e R that has a standard normal distribution. 
Second, we will discuss the case where the entries of the 
vectors and bi, i = 1,...,777, are independent draws of 
a random variable P G {0, E-y/q} that satisfies 

P(P = 0) = ^ and P (P = ±V7) = f, (22) 

q Zq 

where g ^ 1. The larger q is, the sparser (on average) the mea¬ 
surement vectors are, hence improving the computational ef¬ 
ficiency of the measurement strategy. In this case, the average 
sparsity of the vectors and bi is 77i/g and 772 /g, respectively, 
and computing one measurement can be done in O ( 771772 /g^) 
operations. This type of measurement strategy with sparse 
vectors was proposed in for Johnson-Lindenstrauss em¬ 
beddings and can also be used for compressive sensing GB- 
We will see that the number of measurements needed to 
preserve the norm of all vectors in S depends on the parameter 
Q- 

2) RIP for rank-one projections with gaussian vectors: In 
this section, we study the case where the entries of the vectors 
Oi and bi are independent draws of the random variable N. 
As the linear map L has the form presented in Theorem [III. 3 
(with b the identity and H-H^ = |H|prob)’ follow Recipe 

to prove that L has the RIR 

1: The set S has a finite upper-box counting dimension by 
assumption. 

2: To estimate 6^, we use the result presented after 
Lemma ID. 21 which shows that there exists an absolute 
constant D > 0 such that 

S-i := inf E lajMbA ^ D. 
w\es ' * ' 

Then, using the fact that 

E|ajM6g| < (E|ajM6g|"J =||M||p^„b, 

we also deduce that ^ 1. 

3: To estimate Ai, we use Lemma [DT] which shows that there 
exists an absolute constant E > 0 such that 

Therefore Ai ^ E. 

4: The last step of the recipe proves that L satisfies the RIP 
with constant S G (0, D) and probability at least 1 — C, i.e., 

||L(M)||i ^ 1 +J VMg5, 

provided that 

777 ^ max I s log (— ) , log 

I \^sj 

where C > 0 is an absolute constant. The number of 
measurements 777 thus scales linearly with s. This result 
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is true for any set S that satisfies Assumption ^ and is not 
restricted to the set of low-rank matrices. 


Let us highlight that the result presented in this section 
involves an ^i-norm in the measurement domain, instead of 
the ^ 2 -norm as in most CS results. The authors of (U and 
GD also chose the ^i-norm in the measurement domain to 
prove the RIP. In addition, the authors in proved that if 
one chooses the ^ 2 -norm in the measurement domain for the 
RIP, then at least 0{nin2) measurements are needed to ensure 
that L satisfies a RIP condition that guarantees the recovery 
of rank-1 matrices (see Lemma 2.1 in |T^ and the discussion 
thereafter), which is much larger than ni +772 — 1, the number 
of degrees of freedom for rank-1 matrices. Choosing the £i- 
norm in the measurement domain solves this issue. 

3) Recovery of a known result: Let now S be the set of 
rank-2r matrices with unit Frobenius norm. Lemma 3.1 in 
shows that Ns{e) ^ e > 0. The up¬ 

per box-counting dimension of S is thus 2r(ni+n2 + l). Using 
the procedure of Section IV-A2[ we can satisfy Assumption 
l^by taking = 1/9 and s = 4r(ni + 772 + 1). The above 
result then shows that for 6 G (0, L^), with probability at least 

1-e, 


E-6^ ||i^(M)||i ^1 + ^, VM 

provided that 

C ( 

777 ^ ^ max < 2r(77i + 772 + 1) log (9), log 

where C > 0 is an absolute constant. In particular, the set of 
rank-r matrices is stably embedded into for 777 satisfying 
the above inequality. 

Let us compare this result with Theorem 2.2 in p^ . 

Theorem IV.l (Theorem 2. 2, jl^ ). Let L : ^ 

be the linear map defined in \2l\. For positive numberS^Ci < 
1/3 and C 2 > 1, there exist constants C and 5, not depending 
on Til, 772, cind r, such that if m ^ Cr{ni + 772 ), then with 
probability at least 1 — , L satisfies 

^l||M||prob^ll^(M)||i^C2||M||p^^,, 

for all rank-2r matrices M G 

Comparing our result and the theorem above, we notice 
that we recover the same type of result with a number of 
measurements that needs to be essentially proportional to the 
upper box-counting dimension of the set of rank-r matrices in 
order to guarantee that the Frobenius norm of these matrices 
is preserved. 

4) RIP for rank-one projections with sparse vectors: To 
improve the computational efficiency of the measurement 
procedure, one can think of using sparse vectors and 
Let us thus discuss the case where the entries of the vectors 
Oi and hi are independent draws of the sparse random variable 
P according to ( |22| ). Recipe still works to prove that L has 
the RIP. 

^ These numbers depend on the subgaussian norm of the measurement 
vectors ai and bi. 



1: The set S has a finite upper-box counting dimension by 
assumption. 

2: To estimate 6^, we use the result presented after 
Lemma ID. 21 which shows that there exists an absolute 
constant D > 0 such that 


6 ^ := inf E 
w\es 


ajW\bi\ > 


D 


q{l+logq)’ 


for g ^ 2. We also have ^ 1, as with Gaussian random 
measurement vectors. 

3: To estimate Ai, we use Lemma [DT] which shows that Ai ^ 
E g, where > 0 is an absolute contant. 

4: The last step of Recipe proves that L satisfies the RIP 
with constant 6 G (0,D/(g -f glogg)) and probability at 
least 1 — <^, i.e., 

^-(5 < ||L(M)||i < l + (5 VMg 5, 
provided that 

Cq^ r /1A , 

777 ^ ^ max < s log — , log 

I VsJ 

where C > 0 is an absolute constant. 



The number of measurements 777 thus still scales linearly 
with s but also with g^(l + log g)^, when S < D/{q + q\ogq). 
As the (average) cost of computing one measurement is 
O ( 771 772/g^), this result is too weak in general to prove any 
benefit of using sparse measurement vectors. 

5) Discussion: Let us highlight that the above result is 
derived without using any properties of S beyond its box 
counting dimension. In particular, this result also applies for 
sets S of sparse matrices for which we guess that using very 
sparse measurement vectors is not an adequate measurement 
strategy. Indeed, we need the support of the measurement 
vectors and the support of the vectors in the sparse matrix 
to both overlap in order to have non-zero measurements. As 
the measurement vectors get sparser, we increase the chance of 
having only null measurements, and thus decrease the chance 
of embedding the original matrix. This fact is reminiscent of 
the fundamental incoherence property between the sparsity 
domain and the sampling domain required in compressed 
sensing: one cannot sample a signal in the domain where 
it is sparse. More optimistic results might be obtained by 
exploiting additional structures in 5. In that perspective, the 
work | [24| on sparse dimensionality reduction in Euclidean 
space might be particularly useful to obtain more optimistic 
results. The authors of | [24| prove that, with high probability, 
some random sparse measurement matrices satisfy the RIP 
for a wide class of subsets of the unit sphere. They provide 
sufficient conditions on the number of measurements and on 
the sparsity of the measurement matrix to ensure that the 
RIP is satisfied. These bounds depend on additional properties 
of S beyond its box-counting dimension. For example, for 
sparse signals in an orthonormal basis, they further require 
an incoherence property to ensure that the RIP holds (Section 

5.1, Ip). 

This example illustrates the importance of the ratio 
max(2Ai, Ai)/^i in the design of the linear map L. Even 








PUY et al.\ RECIPES EOR STABLE LINEAR EMBEDDINGS EROM HILBERT SPACES TO R 


9 


though the measureme nt vecto rs a^, bi are subgaussian random 


IV-B2 


the ratio max( 2 Ai, Ai)/^i 


IS 


vectors as in Section 
too large and the result does not predict any gain in the 
computational efficiency compared to Gaussian rank-one mea¬ 
surements. 

V. Related works 

A. Infinite dimensional CS and generalised sampling 

An extension of the CS theory to an infinite dimensional 
setting is also studied in 0- In their work, the ambient space 
is a separable Hilbert space 1-L and the authors concentrate on 
the particular case of sparse signals in an orthonormal basis 
of H. Their result does not involve a RIP but the linear map 
used to sense sparse signals shares some similarities with our 
construction of L. 

Let us take the example of s-sparse signals in an orthonor¬ 
mal basis of 1-L, e.g., one can think of a wavelet basis. 

For simplicity, let us consider that the support of the signals in 
S belongs to {0,...,n — 1 }, hence S has a finite upper box- 


(l-e*)||UP„a||2^||PdUP„a||2<||UP„a|| 


2 ’ 


(23) 


the facts that ||UPnQ ^||2 = IIPr 


(y ||2 and = P^P^ = P^ 


P^UTpjP^UPn-Pnllo^e 


^^112 


(24) 


is sufficient to ensure ( [23] ). 

It is interesting to notice that this condition also appears in 
the generalised sampling theorem of p5| . Indeed, when U is 
an isometry as in the present example, the condition for stable 
reconstruction presented in amounts to the control of 
lIPnUTPjP^UPn — Pn|l 2 (^^c Scction 5 in and the proofs 
therein). It is also proved in p5| that this quantity tends to 0 as 
d tends to infinity. This shows that to choose d in practice, one 
just needs to increase it until ( j^ is satisfied. In the particular 
case where H = I/ 2 ([ 0 ,1]), ('0j)jeN is the Haar wavelet basis 
(ordered from coarse to fine scale), and is the Fourier 

basis (ordered from low to high frequencies), the authors of 
further prove that (24) is satisfied for d = 0{n^), and 
their numerical experiments suggest that it is already sufficient 
to take d = 0{n) (see Section 5.4 in flSl). This indicates 


that does not always need to be a very large dimensional 
subspace of V, to ensure that the RIP holds. 

Finally, we note that condition also appears in the 


counting dimension. In Section IV-A we measure the signals 
by first projecting them onto a finite dimensional subspace of 
large dimension d that approximates the set S. To construct 
this subspace, we propose here to use another orthonormal 
basis of H, e.g., one can think of the Fourier basis. 

We choose to construct this finite dimensional subspace by 
using only the first d basis vectors fii . Let us define the matrix 
U = {uij := ( 04 ,Let a denote the coordinates of 
X e H in We have b{x) = P^^Ua G where P^^ 

is the matrix that selects the first d rows of U. According to 
our recipe, we need to choose d such that 

(1 - e*) ||a;|| ^ || 6 (a ;)||2 ^ ||a;||, Va; e 5, 
in order to have the RIP. Here, this condition is equivalent to 


for all UPnO: G 5 C S, where P^ is the matrix that keeps 
the first n entries of a and set all the other ones to 0. Using 


taking the square of (23) and rearranging the terms, one can 
notice that the condition 


extension of the CS theory to an infinite dimensional setting 
presented in Q, involving however another operator norm. 
This property is called the weak balancing property in j^. 

B. Other measures of dimension ? 

In his monograph GD , Robinson studies the problem of em¬ 
bedding a compact subset S of an infinite-dimensional (Hilbert 
or Banach) space into He considers different definitions 
of dimension, in particular, the Hausdorff dimension (denoted 
diniH hereafter) and the upper box-counting dimension. He 
shows that embedding the set H into is possible as soon 
as dimi:/(i; — H) < m but that no modulus of continuity 
is possible for i.e., the embedding is not necessarily 

stable. One should further assume that the upper box-counting 
dimension of S is finite in order to find linear embeddings 
with Holder continuous inverses. 

In our work, we show that a stable embedding of a set H 
into finite dimensions exists when the upper box-counting di¬ 
mension on the normalised secant set of H is finite. Compared 
to GD which uses the dimension of the secant set S — S, we 
are thus adding another constraint by normalising the secant 
set. An interesting question is thus whether we can change 
our measure of dimension and still guarantee the existence of 
stable linear embeddings or not. A first step in answering this 
question is checking if a finite upper box-counting dimension 
of 5 is a necessary condition to the existence of linear maps 
that satisfies the RIP. The following example shows that it is 
only a sufficient condition, which leaves room for a refinement 
of our results. This example is inspired by the construction of 
the “orthogonal sequence” in GD 

Let H be the space of square-summable sequences, 
and define the correlated sequence S = {xi}i^i with Xi = 
r* (e^ + 6eo), where (e^)^^^^ is the standard orthonormal basis 
of H, r G (0,1) and 6 > 0. Now consider the linear map 
L : H ^ M defined by L{x) = {x^ef). Let us show that the 
linear map L stably embeds the set S. Clearly, 

\{xi — ccj, eo)| ^ \\xi — Xj II for all Xi, Xj G S, 

so let us consider the lower isometry bound. We want to show 
the existence of an a > 0 such that 

I {^i 5 ^o) I ^ W^i II 1 ^ i ^ 1, 

or equivalently, 

- r^f), \/j > i ^ 1. 

Re-arranging the terms gives 

6 V 2*(1 - ^ 0,2 ^ ^ 

for all j i ^ 1. We can thus choose 

2 . b‘^(l- 

a = mm 


> 


j>i^i \ ^ ^b‘^(\ — 

(l+r2+&2) 
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Hence, L satisfies the RIP on 5, the normalised secant set of 
S: 


a ^ I (y, eo) I ^ 1, for all y e S. 


Consequently, L stably embeds the set S. Let us now cal¬ 
culate the upper box-counting dimension of S. Consider the 
following infinite sequence of vectors from S, 

X2k - X2k+1 _ e2k - re2k+i + ^(1 - r)eo 

\\x2k — X2k+l\\ 


with k ^ 1. Note that each Vk is the only point in the sequence 
that has a non-zero component in the direction e 2 k- Therefore, 


\\vk - Vk'W ^ 


1 

-f 6^(1 — r)^ ’ 


\/k + k'. 


Thus, for all e < 2“^[1 + 6^(1 — it is impossible 

to cover the set {vk}k^i C S with a finite e-cover. Hence, 
dim (5) = oo. We thus have a set S of infinite upper box¬ 
counting dimension for which there exists a linear map that 
satisfies the RIP. A finite upper box-counting dimension of the 
normalised secant set is thus not necessary for the existence 
of a stable linear embedding. 

Let us terminate this section by highlighting an important 
fact. We mentioned earlier that the upper box-counting dimen¬ 
sion of S is often twice the upper box-counting of H (or H D § 
if S is not bounded). It is however not always the case. Indeed, 
let us calculate the upper box-counting dimension of S. Let 
e > 0. We note that 


llajill =r*(l +62)1/2, 


Let i*(e) be the smallest integer such that + 62)1/^ ^ 

e. For e small enough, we have i*{e) > 2 . Hence, we can 
cover with a single e-ball at 0. Then, the remaining 

7 *(e) — 1 points, can be separately covered by at 

most 7 *(e) — 1 balls of radius e centered at each Xi. This gives 
^ 7 *(e). Then, 

logA^E(€) ^ _ lQg^*(e) _ 

— loge ^ — (7*(e) — 1) logr — ^ log(l-f 6^) 

Hence, 


dim(S) = limsup 

e^O 

^ limsup 


logA^E(e) 

-loge 


logi* 


^oo -(i* - 1 ) logr - I log(l-1-6^) 


= 0 . 


Since dim(S) cannot be negative, we have dim(S) = 0. We 
thus have an example of a set S with a null upper box-counting 
dimension but whose normalised secant set has an infinite 
upper box-counting dimension. In general, one cannot directly 
deduce the dimension of S from the dimension of S. 


C. Related works with similar proof techniques 

A closely related work is the one of Dirksen p 6 | . In 
this work, the ambient space is a (possibly infinite) Hilbert 
space and a generic theory for dimensionality reduction with 


subgaussian maps is presented. This work allows one to derive 
embedding results once a random linear map L is given. 
Theorem |IIL4| can for example be derived using the generic 
theory developed in 1^ . In our work, we provide in addition 
a recipe for the construction of the linear map L itself. Let us 
also note that one cannot directly recover the results presented 


in Section IV-B with the result presented in p 6 | . 

Another related work of Dirksen is that extends results 
presented in p7| , | [28| and where a technique to obtain tail 
bounds of the supremum of a stochastic process is presented. 
Let us highlight that these results can be another approach to 
obtain our generic theorem (Theorem |IL2| ). From a more tech¬ 
nical point of view, eg provides bounds for all p-th moment 
of the supremum of a stochastic process. Deviation inequalities 
immediately follows from these bounds. The proofs use the 
so-called generic chaining argument and the 7 -functionals. 
Although the approximation of these functionals might not 
be easy, this method can produce sharper bounds than ours. 
In our proofs, we use more basic tools of probability theory 
with a another type of chaining argument to directly obtain 
the deviation inequalities. 

Related works also exist for embeddings of manifolds in 
finite ambient dimens ion 0 , p3| , p9| . In particular, the result 
presented in Section IV-A generalises the result presented in 
0 to any model set with finite upper box-counting dimension. 
The proof technique used in 0 is also based on a chaining 
argument. In 0, one of the main contribution is also the 
identification of the manifold properties that allow one to 
control the covering dimension. In | [30| , the same type of 
embedding result also appears but for signals that lie on 
a collection of continuously parameterised low-dimensional 
subspaces in W^. 

Even though the following result applies in a finite di¬ 
mensional Euclidean space, let us mention that Oymak et al 
proved in | [3T| that certain structured random matrices can 
embed any low-dimensional set of This is a particularly 
useful result for practical applications as it shows that matrices 
for which fast matrix-vector multiplication algorithms exist 
can be used to embed any set, as long as it has a small intrinsic 
dimension. It would be interesting in a future work to study 
how such computationally efficient matrices can be used in 
the construction of our linear map L. 


VI. Conclusion 

We presented a generic recipe to prove that a random linear 
map satisfies the RIP on arbitrary low dimensional sets S 
which lives in a Hilbert space. The proposed framework is 
general enough to take into account a large class of subsets S 
as well as structured and unstructured measurement processes. 
We also explained how to construct random linear maps that 
satisfy the RIP with high probability in this general setting. 


The linear map presented in Section |IV-A| is built in two 
steps. The first step consists in a projection onto a well- 
chosen finite-dimensional subspace Ve^ and the second step 
uses a random subgaussian matrix to reduce the dimension. In 
order to obtain linear maps which are computationally more 
efficient, one can consider replacing the random matrix in the 
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second step by a more structured one. One possibility is to 
use a variable density sampling technique, as we considered 
in Q. Note that the condition on m presented in Q is not 
optimal. Indeed for /c-sparse signals in the condition 
requires a number of measurements essentially proportional 
to to ensure that the RIP is satisfied. To understand 
the performance of the variable density sampling technique 
in practical applications, it will be important to determine 
under which additional structures this condition on m can be 
improved. 

We also saw that a finite upper box-counting dimension of 
the normalised secant set 5(11) of S is not necessary for the 
existence of a stable dimension reducing linear embedding of 
S. An interesting question is thus what appropriate measure 
of dimension of 5(11) is both necessary and sufficient for the 
existence of a stable dimension reducing linear embedding of 
S. 

Appendix A 
Proof of Theorem Iii.2I 


¥{\hp{x)\^X}^ 


2 e-cimA if ^ C2_ 

’ r.-i 


(25) 


Si{es,0 ■= Jlog [^-Ns (es) ), 






S3(«, f) := E ‘»g (^ ■ (^) ) ^ 

jeN ^ ^ ^ 


If Assumption holds, then 


6p ^ 


Si{es,0 ^ 


The proof of Theorem |II.2| is based on a chaining argument 
which is a powerful technique to obtain sharp bounds for the 
supremum of random processes 0 (H) (n Before proving 
Theorem II.2 we need some preparations. 

First, we notice that if Assumption holds then we can 
cover the set S with a finite number of balls. Furthermore, a 
bound on the number of balls sufficient to cover S is available 
for all radius e ^ e^. In the proof below, for each j ^ 0, 

• Cj C S, denotes a minimal (2“-^ e 5 )-net for S; 

• TTj denotes the mapping 7rj{x) G argmin^^^^. ||x — 2 :||; 

• Vj denotes the finite set {(7rj+i(cc),7rj(cc)) | cc G 5} C 
Cj-^i X Cj. 

One can remark that the cardinality of Cj is Ns{2~^es), that 
the cardinality of Vj is bounded above by N^{2~^~^es), and 
that ||y - z\\ ^ 2-^+^es. 

Second, we remark that Assumption [^implies that for any 
fixed X e S cE> 


ytCiTYl C2m 

with probability at least 1 — 3^. 

Proof: We begin by establishing the telescopic sum 
expression: 

CX3 

hp{x) = hp{7ro{x)) + [hp{7rjj-i{x)) — hp{7rj{x))] . 

j=0 

The above equality holds because the linear maps of the form 
of L are continuous with respect to H-H. Therefore, ||I/(-)||^, 
E ||I/(-)||^, and thus hp are also continuous with respect to H-H. 
Then, 

N 

hp{x) - hp{TTo{x)) - - hp{Trj{x))] 

j=0 

= hp{x) - hp{'KN+l{x)). 

As limTv^oo ||7rAr+i(a3) — cc|| = 0, we have 

limTv^oo \hp{x) — hp{'KN+i{x))\ = 0 by continuity. 

We continue with the triangle inequality which yields 

8p = sup \hp{x)\ 

£CG<S 


^ sup \hp{Tro{x))\ + V sup \hp{TTj+i{x)) - hp{Trj{x))\ 
xes xes 

oo 

= max \hp{xo) \ + max \hp{y) - hp{z)\. 

xoeCo {y,z)eT>j 

Let Oj^b > 0, with j G N, be parameters whose values will 
be chosen later on. The union bound yields 






Indeed, it suffices to take y = x and z = 0 in ^ and 
With these tools in hand, we can prove the following 
intermediate result from which Theorem III.2I follows. 


Lemma A.l. Let (/i,...,Z^) be m random vectors drawn 
from according to a probability distribution y, L : H ^ 
be the linear map defined in 5p be the quantity defined 
in 0 and ^ G (0,1). Define 


^ P < max \hp{x{))\ ^ h\ 

J 

+ ^p| max \hp{y) - hp{z)\ > a A 
^ Ns (es) • max P{|/ip(a:o)| > b} 

Xq^Cq 

E (^) • max^ P{|/ip(y) - hp{z)\ ^ a^} 


(26) 


j^o 


In the last step, we used the facts that the cardinality of Cq is 
Ns (e^), and that the cardinality of Vj is bounded above by 

We now bound the first term on the right-hand side (rhs) of 
We recall that Cq C S. Inequality ( [25] ) thus yields 


max ¥{\hp{xo)\ ^ 6} ^ 




xoeCo ^ ' I 2e-‘=^’"^if 6 > ^ 


If 


■^i(es,0 ^ c| 
m Cl ’ 
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we choose 


b := 


Si{es,0 _ log(2iV5 (es) A) 


cim 


and 


b:= 


y 

Sj{es,0 _ log{2Ns{es)/0 


C2m 

otherwise. This choice yields 


C2m 


Ns (es) • max P{|/ip(a;o)| ^ 6} < 

xqCCo 


and 


b ^ 


Si{es,0 ^ S^,{es,0 

s/cim C2m 


'{\hp{y) - hp{z)\ ^ aj} 

= F {\hp{y) - hp{z)\ > 


es 


i\hpiy) ^ Ki^)\ ^ 


21 


-1 


vpy 

^ if 5^^ <C2/ci, 

- 1 2 if 5^^ ^ C 2 /C 1 . 




|y-^ll} 


Define 

J := jj e N 

We choose 

2 - 1 +hs 


for all j G and 

2 - 1 +hs 


. 7v| f 

^ S y2i+i. 




log(^ wj(^)). 




C2m \ ^ 

for all j e N \ J". Notice that, by definition of J", we have 




2-J+^es Cl 


for all j G and 


ii 


2-i+ie5 " Cl ’ 

otherwise. Replacing Uj by its expression in ( [^ shows that 

Vz-^^ / (y,z)eT>j 

for all j G N. Therefore, 


In total, the rhs of ( [26| ) is thus bounded by 3^. Finally, we 
have 




2i+i 


C2m V t \2-^+^/ 

ieN\:r ^ ^ 




< 




y^cim 




in both cases. 

We continue by bounding the infinite sum on the rhs of ( [26| ). 
We notice that for {y, z) G Vj, we have \\y — 2 :|| ^ 2“-^+^^. 
Assumption shows that 


ieN ^ ^ 

^2(65,0 , ^3(65,0 

= es —-^=—\-es -. 

^yclm C2m 

In summary, we have shown that 

Sp ^ b ^ ^ Qjj 

j^o 

with probability at least 1 — 3^. Using the bounds on b and 
we have 

, / A(e5,0+52(65, e)e5 , 52(65,0 + 53(65,065 

On ^ -;--1- 


^/c{m 


C 2 m 


(27) 


with at least the same probability. ■ 

We can now prove Theorem |ll.2[ First, we need to compute 
the sums 81 , 82 ^ 83 in the above lemma. It involves unin¬ 
teresting computations which, for completeness, we detail in 
Appendix]^ For ^ G (0,1) and G (0,1/2), if Assumption 
1^ holds, these computations show that 


5 i(e 5,0 < Glog ( 2/0 + Gs log(l/e 5 ), 

52(65 ,0 < 8 Glog( 2/0 + 8 G 2 slog( 2 ) + 4 G 2 slog(l/e 5 ), 

5 ^ 65,0 < log( 2/0 +slog(l/65), 

53(65 ,0 < 8log (2/0 + 16 slog( 2 ) + 8slog(l/e5). 
Therefore, 


s ^ ^/ log(V 0 I ^ slog(l/e 5 ) 


cim 


cim 


. 8 ,^./! 2 £M) +8, /?£Ma 

y Cim y cim y 

log (2/0 , slog(l/e5) 


Cim 


C 2 m 


C 2 m 


^ 865 log (2/0 16e5 s log(2) 865 slog(l/e5) 


C 2 m 


C 2 m 


C 2 m 


with probability at least 1 — 3^. 

Let 6 G (0,1) and define D = 1/min(ci,C 2 ). We remark 
that if 


m > ^|^log(2/0. 


(28) 
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then, since < 1/2, 


We also have, for es G (0,1/2), 




cim 


cim 


and 


C2m 

Similarly, if 


C2m 


SD 

^ log(l/e5), 


then, since es < 1/2, 




Cim 


Cim 


.log(lfe) 


C2m 

Finally, if 


C2m 


321^ , 

m ^ s log(2), 


then, since < 1/2, 


8, /5£M2 ^ j and ^ ^ 

y cim C2m 

To terminate the proof, we notice that if 


D 

m ^ ^ max 
0^ 


{a log (1), log (I)}, 


< yiog (2) 2 + 2v1og(270 

j=0 

yiog (2) + yiog (2/^)j < 4v'log(2/C). 


= 2 


Similarly, 

oo 

y^2“-^ log (2-J+V?) ^ 2 [log(2) + log(2/^)] < 41og(2/^). 


^2-yiog ( 22 (i+i)«e 52 *) 


j=0 


= ^2 ■^y2(j + l)slog(2) + 2 slog(l/e 5 )) 


j=0 




V'2sl0g(2) ^2-2V/Ti + 2^/25 log(l/e5) 


(29) 


i=o 

oo 


(30) 


< \/2slog(2)y^2 -^(j + 1) + 2y2slog(l/e5) 

j=0 

= 4:^/2s\og{2) + 2y2slog(l/e5). 

Similarly, 

oo 

2“2 log < 8s log(2) + 4s log(l/es). 

j=o 

If Assumption holds, we have 

for all j e N. Therefore, using the pre-computations above, 
we obtain 


with D := 321 min(ci,C 2 ), holds then (28), (29), (30) all hold. 
Under this condition, we have 6p ^ 10^, with probability at 
least 1 — 3^. Two change of variables (^' = 3^, 6' = 10^) 
prove the theorem with C = 3200. 

Appendix B 

Evaluation of Si, S 2 , S 3 of Lemma [a7T] 

To estimates the sums Si, S2, S3, we start by noticing 
that ^ Section 4.2.3 in |[^). We 

precompute, for ^ G (0,1), 

00 00 

Glog (2^+70 < y] 2" Vi log (2) + log (2/^) 

j=0 j=0 

00 00 

< Y G/iog (2 )+Y yiog (2/c) 

j=0 j=0 


5'2(es ,0 < 8\/log (2/^) + 8Vslog(2) + 4\/2slog(l/es), 
and 

Ssi^s, 6 < 8 log (2/^) + 16s log(2) + 8s log(l/e 5 ). 

We also have 

> 51 ( 65,0 < log (2/0 + slog(l/e 5 ), 

and 

5i(e5,0 < yiog (2/0 + V log(l/e5). 
Appendix C 

Proof of Theorem IIII.3I and Theorem IIII.4I 
A. Basic tools 

In this section, we use several properties of subgaussian 
and subexponential random vectors/variables (see Definition 
III. 1 1 and Definition |III.2| ). We let the reader refer to, e.g., fT^ 


for more information about them. We recall here one useful 
property. 

• If X is a subexponential random variable then so is 
X - EX, and we have ||X-EX||^^ ^ 2 ||X||^^ 
(Remark 5.18, (T^). 

We will also use the following Bernstein-type inequality for 
subexponential random variables. 


Lemma C.l (Corollary 5.17, p^). There exists an absolute 
constant c > 0 such that for independent centered subexponen¬ 
tial random variables Xi,..., X^ with sub exponential norm 
bounded by K > {), we have, for every 0 ^ ^ X, 


i=l 
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and, for every t ^ K, 

^ f 1 


m 




i=l 




B. Proof of Theorem 111.3 


The independent random vectors ai,. 
assumed to be such that 


' 1 


are 


Ai := sup {\\aJb{x)\\^ /\\b{x)\\A 
xe{s-s}us ^ ^ 

x^O 

Therefore, for all p ^ 1, we have 

E{\ajbix)rf^ < Aip \\b{x)\\,, 


< + 00 . 


(31) 


for any cc in {5 — 5} U S. 

Let y,z he two fixed vectors in S U {0}. Define X := 
Z)™ 1 Xi where 

Xi := \ajb{y)\ - \ajb{z)\ - E{\ajb{y)\ - \ajb{z)\). 

It is clear that X is a sum of m independent centered random 
variables. To use Lemma |C.1| we need to show that these 
variables are subexponential and bound their subexponential 
norm. We have 

(E||aj%)|-|aj 6 (..)in'/"^ (E \aj{b{y) - b{z))ff^ 

= (E \aj{b{y-z))ff^ 

< Aip \\b{y-z% 

^ Ai p \\y - z\\. 

The second step follows from the linearity of b. The third step 


... m 

hi{y) - hi{z) = — E"^»- 


C. Proof of Theorem m 


The independent random vectors ai,..., G 
assumed to be such that 

A 2 := sup I \\ajb{x)\\^ /\\b{x)\\A < +(X) 
xe{s-s}us ^ ^ 

x^O 


Therefore, for all p ^ 1, we have 

E(|aj6(a;)r)'/^ ^ A 2 ||6(a;)||,, (32) 

for any x in {S — S} U S. 

Let y^z ht two fixed vectors in S U {0}. Define X := 
Xi where 

Xi := \ajb{y)f - \ajb{z)f - E{\ajb{y)f - \ajb{z)f). 

It is clear that X is a sum of m independent centered random 
variables. To use Lemma |C.1| we need to show that these 
variables are subexponential and bound their subexponential 
norm. We have 

(e \ajb{y)f-\ajb{z)f ' 

= (E {\aj{b{y) + b{z))f \aj{b{y) - b{z))\^)f^ 

< {E\ajb{y + (E\aJb{y-z)f^Y\ 

In the last step, we used the Cauchy-Schwarz inequality and 
the linearity of b. Using (32) and the facts thaj^ (y ±z) e 
{5 — 5} U 5, we obtain 


(E\ajb{y ± z)f^y’’ < A 2 11% 


±. 2 ) 


< A2 lly ±^ll 

2 V 2 A 2 


foTy-\-z 
\/2 A 2 \\y - z\\, for y-z 


follows from (31) and the fact that y — z e {5 — iSjUiS. The 
last step follows from inequality ( p^ . 

We deduce that Xi,..., X^ are independent centered 
subexponential random variables with subexponential norm 
bounded by 2Ai ||p — 2 :||. Observe that 


In the second step, we used inequality ( p^ . In the last step, 
we used the fact that \\y\\ ^ 1 and || 2 :|| ^ 1. We thus have 


E 


\ajyf - \ajzf 


P\^/P o 

j < 4A% \\y 


i=l 

Lemma [cT] with K = 2 Ai ||p — 2 :|| yields 

F{\h,{y) - h,{z)\ ^X\\y- ;^||} ^ 

for every 0 ^ A ^ 2 Ai, and 

¥{\h,{y) - hi{z)\ ^ A ||y - z\\} < 

for every A ^ 2 Ai, with ci = c/(4A^) and C 2 = c/( 2 Ai). 
Note that C 2 /C 1 = 2Ai. Hence, Assumption is satisfied. 
Theorem |IL2| terminates the proof. 


Therefore, Xi,..., X^ are independent centered subexponen¬ 
tial random variables with subexponential norm bounded by 
8 A 2 Up — 2 :||. Observe that 


... m 

h2{y) - h2{z) = — E Y- 

m ^ 


i=l 


are 


Lemma [Cj] with K = 8 A 2 ||p — 2 :|| yields 

¥{\h2{y) - h2{z)\ > A ||y - 2||} < 

for every 0 ^ A ^ 8 A 2 , and 

¥{\h 2 {y) - h 2 {z)\ ^ A ||y - ^11} < 2e-'==’”\ 

for every A ^ 8 A 2 with ci = c/(64A2) and C 2 = c/( 8 A 2 ). 
Note that 8 A 2 = C 2 /ci. Hence, Assumption is satisfied. 
Theorem |II.2| terminates the proof. 


^Remark that S is symmetric. Therefore S + S = S — S. 
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Appendix D 

Random rank-one projections 


A. Estimation of Ki from Section IV-B 

Let M be a fixed matrix in and a G b G 

vectors whose entries are independent copies of the 
random variable N or P, defined in ( |^ . In (T^, the authors 
show that a’^Mb is a subexponential random variable. In this 
section, we estimate the parameter Ai needed in Recipe 
This estimation follows from the lemma below, whose proof 
starts from intermediate results presented in |T^ . 

Before presenting the lemma, we recall the definition of the 
double factorial for odd numbers: {2k — 1)!! = {2k)\/{2^ kl) 
for all k ^ 1. Using Stirling’s formula for the factorial, we 
obtain 


{2k - 1)!! = v'27r^fc/e) ^*^^ ^ ^ 2'^{k/ef 
2^y2Tik (k/e)^ 

(33) 

where 1/(12A: + 1) ^ A/^ ^ l/{12k). 

Lemma D.l. Let a G and b G be vectors whose 
entries are independent copies of a random variable A G M. 
Define 

.l/(2/c) 


ax 


sup 

k^l 


{2k-l)\\_ 


Assume that 1 ^ ax < oo. For any fixed matrix M in 
and all p ^ I, 


(E< 2^/(2^) g-i p 


Frob • 


As a consequence, a’^Mb is a sub exponential random variable 
with subexponential norm bounded by Ca\ where 

C = 2^/2 e-i 1.04 > 1. 

The proof of this lemma is given below. From the above 
result, we see that we only need to estimate ax with X = N 
and A = P to estimate the constant Ai in Section |IV-B| 

• [Case where A = A] As A is a standard random 
variable, we have E A^^ = {2k — l)l\. Therefore, ax = ^ 
and Ai ^ C where C > 1 is an absolute constant. 

• [Case where A = P] It is easy to notice that E P^^ = 

q^/q. Using (33) and the fact that ^ 2/2, we 

obtain 

.l/(2/c) 


Ep2/c 


(2A:-1)!! 






V2 2-^ {k/e)-^ q^/q 


.l/(2/c) 


V2i 


— 1 o — k 


(fc/e) 


-k 


.l/(2/c) 


7 V 2 


Therefore, ap ^ 1.39 ^/q and Ai ^ C ^/q where C > 1 
is an absolute constant. 

Proof: We start with inequality (0.20) in the supplemen¬ 
tary material of |T^, which shows that, for all /c ^ 1, 


E \a^W\br [(2A:-1)!!]^||M|| 


Frob • 


Using (33) and the fact that ^ 2 , we obtain {2k — 

1 )!! < V2 2'=(fc/e)''. Therefore, 


E ^ 2 {2aj^/ep k' 


2k i2k 


2k 

Frob ’ 


for all k ^ 1. To shorten notations, we define 

CaxM (2^x/^) ll^llFrob* 

We thus have 

We now follow exactly the interpolation technique used in 
the proof of Corollary 8.7 in ||^. Let G [ 0 , 1 ]. Then, 

< (2Cf,.M (fc + l)2('=+i))' 

/ j , X 20(1-0) 




m 


^ 2 V 2 [k + 

Holder’s inequality yielded the first step. In the second step 
from below, we used the fact that 

(1 - 0) log{k) + 0 log{k + 1) ^ log ((1 - 0)k + 0{k + 1)), 

which follows from the concavity of the logarithm. In the last 
step, we used the facts that {k-\-l)/k ^ 2 and that 26>(1 — ^) ^ 
1/2. Taking p = 2{k + 0) proves the lemma for p ^ 2. 

For 0 < p ^ 2, we have 

(E|aTM6r)i/p ^ (E|aTM6|2)V2 = 

<23/(2p) e-ipa3rl|M||Frob- 

In the first step, we used the fact that for any random 
variable A, (E|A|^)^/^ ^ (E|A|^)^/^ for 0 < p ^ g. 
The last inequality follows from the fact that the minimum 
of /(p) = p is attained atpo = 31og(2)/2 and that 

^xfiPo) ~ 1-0397 a^ ^ 1- The lemma is thus proved for 
all p > 0. ■ 


B. Bound on 6^ 

Let us first introduce the following lemma which bounds 
Eja^xl for isotropic subexponential random vectors a e 
and any fixed vector x 

Lemma D.2. Let a G be a random vector such that for 
any fixed vector x G 

E |a''’x|^ = ||x ||2 and ||a''’x||^^ ^ C ||x ||2 , 
where C ^ 2 is a constant. Then, 

2e»C 

The proof of this lemma is given below. To estimate , we 
thus notice that we only need to estimate the constant C in the 


above lemma. For rank-one projections. Lemma D.l indicates 
that C ^ 0{a\). 
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Proof: The upper bound follows from Jensen’s inequality, 
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E la'^xf = E 
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(E la'^x 
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E \a^x\ > 
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e6(o.i) 
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minimum of g is attained at 6 >* satisfying 
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This gives 
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1 + 


[log(C) + log (2 + 2 log(C))] ^ 5 ( 6 »*), 


log(C) 
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3 ^ 
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